Fusion of Probabilistic Algorithms for the CLEF Domain Specific Task

نویسنده

  • Ray R. Larson
چکیده

1 Extended Abstract This extended abstract describes the Berkeley 1 participation in the Domain Specific task for CLEF 2005. This year we submitted the minumum number of entries for each subtask (3 monolingual runs, 6 bilingual runs, and 3 multilingual runs). In our runs we employed retrieval algorithms data fusion methods that have performed relatively well in some other retrieval contexts, but which will almost surely be abandoned in later attempts at CLEF. The main technique being tested is the fusion of multiple probabilistic searches against different XML components using both Logistic Regression (LR) algorithms and a version of the Okapi BM-25 algorithm. We also combine multiple translations of queries in cross-language searching. In the following paragraphs we will briefly describe the the indexing and term extraction methods used, followed by a description of the retrieval algorithms and data fusion methods. Since this is the first time that the Cheshire system has been used for CLEF, this approach can at best be considered a very preliminary base testing of some retrieval algorithms and approaches. For both the monolingual and bilingual tasks we indexed the documents using the Cheshire II system. The document index entries and queries were stemmed using the Snowball stemmer. Text indexes were created for separate XML elements (such as document titles or dates) as well as for the entire document. The techniques and algorithms used for the DS task were essentially identical to those that we used for the GeoCLEF task, but without the special geographic indexes used for GeoCLEF (our GeoCLEF track paper describes the algorithms and approaches in detail). For the bilingual and multilingual search tasks we used combinations of up to three different MT systems for query translation, using the L&H PC-based system, SYSTRAN (via Babelfish), and PROMT. Each of these translations was combined into a single probabilistic query. The hope was to overcome the translation errors of a single system by including alternatives. However, for translation to Russian from German and English, only the PROMT MT system was available.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Focus Image Fusion in DCT Domain using Variance and Energy of Laplacian and Correlation Coefficient for Visual Sensor Networks

The purpose of multi-focus image fusion is gathering the essential information and the focused parts from the input multi-focus images into a single image. These multi-focus images are captured with different depths of focus of cameras. A lot of multi-focus image fusion techniques have been introduced using considering the focus measurement in the spatial domain. However, the multi-focus image ...

متن کامل

Cheshire II at GeoCLEF: Fusion and Query Expansion for GIR

In this paper I will describe the Berkeley (group 1) approach to the GeoCLEF task for CLEF 2005. The main technique we are testing is the fusion of multiple probabilistic searches against different XML components using both Logistic Regression (LR) algorithms and a version of the Okapi BM-25 algorithm. We also combine multiple translations of queries in cross-language searching. Since this is t...

متن کامل

Dublin City University at CLEF 2007: Cross Language Speech Retrieval (CL-SR) Experiments

The Dublin City University participated in the CLEF 2007 CL-SR English task. For CLEF 2007 we concentrated primarily on the issues of topic translation, combining this with search field combination and pseudo relevance feedback methods used for our CLEF 2006 submissions. Topics were translated into English using the Yahoo! BabelFish free online translation service combined with domain-specific ...

متن کامل

The Xtrieval Framework at CLEF 2008: Domain-Specific Track

This article describes the architecture and configuration of the XTRIEVAL (eXtensible reTRIeval and EVALuation) framework. A first prototype is described in [1]. For CLEF 2007 a second prototype was implemented which was focused on the cross-language aspect. Runs for all subtasks of the Domain-Specific track were submitted. The performance of our submitted runs was on average compared to other ...

متن کامل

Construction, expression, purification and characterization of secretin domain of PilQ and triple PilA-related disulfide loop peptides fusion protein from Pseudomonas aeruginosa

Objective(s): Infection with Pseudomonas aeruginosa has been a long-standing obstacle for clinical therapy due to the complexity of the genetics and pathogenesis, as well for widespread resistance to antibiotics, thus attaching great importance to explore effective vaccines for prevention and treatment. This paper focuses on the introduction of novel Pseudomonas aeruginosa type IV pili (T4P)-ba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005